Optimal Histograms for Hierarchical Range Queries Extended Abstract

نویسندگان

  • Nick Koudas
  • S. Muthukrishnan
  • Divesh Srivastava
چکیده

Now there is tremendous interest in data warehousing and OLAP applications. OLAP applications typically view data as having multiple logical dimensions (e.g., product, location) with natural hierarchies de ned on each dimension, and analyze the behavior of various measure attributes (e.g., sales, volume) in terms of the dimensions. OLAP queries typically involve hierarchical selections on some of the dimensions (e.g., product is classi ed under the jeans product category, or location is in the north-east region), often aggregating measure attributes (see, e.g., [6]). Cost-based query optimization of such OLAP queries needs good estimates of the selectivity of hierarchical selections. Histograms capture attribute value distribution statistics in a space-e cient fashion. They have been designed to work well for numeric attribute value domains, and have long been used to support cost-based query optimization in databases [11, 9, 2, 4, 10, 5]. Histograms can be used to estimate the selectivity of OLAP queries by modeling the (hierarchical) conditions on a given dimension as a set of hierarchical ranges (i.e., two ranges are either disjoint or one is contained in the other), and using standard range selectivity estimation techniques (see, e.g., [10]). The quality of selectivity estimates obtained using a histogram depends on computing a good solution to the histogram construction problem, and there has been considerable recent e ort in this area (see, e.g., [10, 5]). However, while OLAP queries make extensive use of hierarchical selection conditions, previous works on computing good histograms, for the most part, consider only equality queries when computing the error incurred by a particular choice of histogram bucket boundaries. This mismatch between the nature of OLAP queries, and the class of queries considered when constructing histograms can result in poor selectivity estimates for OLAP queries. In this paper, we address this problem and focus on e ciently computing optimal histograms for the case of hierarchical range queries. We make the following contributions:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximating sliding windows by cyclic tree-like histograms for efficient range queries

The issue of providing fast approximate answers to range queries on sliding windows with a small consumption of storage space is one of the main challenges in the context of data streams. On the one hand, the importance of this class of queries is widely accepted. They are indeed useful to compute aggregate information over the data stream, allowing us to extract from it more abstract knowledge...

متن کامل

Fast range query estimation by N-level tree histograms

Histograms are a lossy compression technique widely applied in various application contexts, like query optimization, statistical and temporal databases, OLAP applications, data streams, and so on. In most cases, accuracy in reconstructing from the histogram some original information, plays a crucial role. Thus, several proposals for constructing histograms trying to maximize their accuracy, ha...

متن کامل

cs . D S ] 1 7 Se p 20 06 CR - precis : A deterministic summary structure for update data streams

We present the CR-precis structure, that is a general-purpose, deterministic and sub-linear data structure for summarizing update data streams. The CR-precis structure yields the first deterministic sub-linear space/time algorithms for answering a variety of fundamental queries over update streams, such as, (a) point queries, (b) range queries, (c) finding approximate frequent items, (d) findin...

متن کامل

ar X iv : c s / 06 09 03 2 v 1 [ cs . D S ] 7 S ep 2 00 6 CR - precis : A deterministic summary structure for update data streams

We present the CR-precis structure, that is a general-purpose, deterministic and sub-linear data structure for summarizing update data streams. The CR-precis structure yields the first deterministic sub-linear space/time algorithms for update streams for answering a variety of fundamental stream queries, such as, (a) point queries, (b) range queries, (c) finding approximate frequent items, (d) ...

متن کامل

Understanding Hierarchical Methods for Differentially Private Histograms

In recent years, many approaches to differentially privately publish histograms have been proposed. Several approaches rely on constructing tree structures in order to decrease the error when answer large range queries. In this paper, we examine the factors affecting the accuracy of hierarchical approaches by studying the mean squared error (MSE) when answering range queries. We start with one-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000